skip to main content


Search for: All records

Creators/Authors contains: "Dai, Z."

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Transformers provide a class of expressive architectures that are extremely effective for sequence modeling. However, the key limitation of transformers is their quadratic memory and time complexity O(L2) with respect to the sequence length in attention layers, which restricts application in extremely long sequences. Most existing approaches leverage sparsity or low-rank assumptions in the attention matrix to reduce cost, but sacrifice expressiveness. Instead, we propose Combiner, which provides full attention capability in each attention head while maintaining low computation and memory complexity. The key idea is to treat the self-attention mechanism as a conditional expectation over embeddings at each location, and approximate the conditional distribution with a structured factorization. Each location can attend to all other locations, either via direct attention, or through indirect attention to abstractions, which are again conditional expectations of embeddings from corresponding local regions. We show that most sparse attention patterns used in existing sparse transformers are able to inspire the design of such factorization for full attention, resulting in the same sub-quadratic cost (O(L log(L)) or O(L√L)). Combiner is a drop-in replacement for attention layers in existing transformers and can be easily implemented in common frameworks. An experimental evaluation on both autoregressive and bidirectional sequence tasks demonstrates the effectiveness of this approach, yielding state-of-the-art results on several image and text modeling tasks. 
    more » « less
  2. Normalization layers are widely used in deep neural networks to stabilize training. In this paper, we consider the training of convolutional neural networks with gradient descent on a single training example. This optimization problem arises in recent approaches for solving inverse problems such as the deep image prior or the deep decoder. We show that for this setup, channel normalization, which centers and normalizes each channel individually, avoids vanishing gradients, whereas without normalization, gradients vanish which prevents efficient optimization. This effect prevails in deep single-channel linear convolutional networks, and we show that without channel normalization, gradient descent takes at least exponentially many steps to come close to an optimum. Contrary, with channel normalization, the gradients remain bounded, thus avoiding exploding gradients. 
    more » « less
  3. The Mixed-Reality Integrated Learning Environment (MILE) developed at Florida State University is a virtual reality based, inclusive and immersive e-learning environment that promotes engaging and effective learning interactions for a diversified learner population. MILE uses a large number of interactive Non-Player Characters (NPCs) to represent diverse research-based learner archetypes and groups, and to prompt and provide feedback for in situ teaching practice. The NPC scripts in MILE are written in Linden Scripting Language (LSL), and can be quite complex, creating a significant challenge in the development and maintenance of the system. To address this challenge, we develop NPC_GEN, an automatic NPC script generation tool that takes high-level NPC descriptions as input and automatically produces LSL scripts for NPCs. In this work, we introduce NPCDL, a language that we design for NPC_GEN to give high-level descriptions of NPCs, describe how NPC_GEN translates an NPCDL description into an LSL script, and report a user study of NPC_GEN. The results of our user study indicate that with minimal training, non-technical people are able to write and modify NPCDL descriptions, which can then be used to generate LSL scripts for the NPCs: the development and maintenance of NPCs is greatly simplified with NPC_GEN. 
    more » « less